[SPARK-57556][SQL] Raise a clear error for the TIME data type in Hive SerDe interop by MaxGekk · Pull Request #56850 · apache/spark

MaxGekk · 2026-06-28T15:29:42Z

What changes were proposed in this pull request?

Apache Hive has no TIME type, so TimeType has no faithful representation in Hive SerDe interop. This PR (the Option B / "clear, documented error" path from SPARK-57556) makes TimeType produce a clear AnalysisException instead of a scala.MatchError/internal error when it reaches the HiveInspectors mapping functions, and rejects it in the Hive SerDe write path:

HiveInspectors.toInspector(dataType), toInspector(expr) (TIME literal) and toTypeInfo now throw UNSUPPORTED_DATATYPE via a shared unsupportedHiveType helper. Previously toInspector(dataType) had no TimeType case and no default branch, so a TIME column hit a raw scala.MatchError.
HiveFileFormat.supportDataType rejects TimeType (recursing into nested struct/array/map/UDT types, preserving the prior default for all other types) so Hive SerDe writes raise UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE (format Hive) via FileFormatWriter.verifySchema.
Documented the limitation on the TIME entry in docs/sql-ref-datatypes.md.

Why are the changes needed?

HiveInspectors had no TimeType case, so object-inspector creation and TypeInfo mapping fell through to a MatchError/internal error when a TIME column or literal reached Hive SerDe paths (for example, a TIME argument to a Hive UDF/UDAF/UDTF). This makes the behavior explicit and documented, consistent with the existing TIME rejection for Hive ORC (SPARK-51590).

Does this PR introduce any user-facing change?

Yes. Using TIME with Hive UDFs or in a Hive SerDe write now fails with a clear error that names the unsupported TIME type, instead of a MatchError/internal error. For example, SELECT myHiveUDF(TIME'12:01:02') now reports [UNSUPPORTED_DATATYPE] Unsupported data type "TIME(6)" (wrapped by the Hive UDF resolver), and writing a TIME column through the Hive SerDe write path reports [UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE] The Hive datasource doesn't support the column ... of the type "TIME(6)".

How was this patch tested?

Added tests and ran them locally (build/sbt 'hive/testOnly *HiveInspectorSuite *HiveUDFSuite *InsertSuite'):

HiveInspectorSuite: toInspector(TimeType()), a TIME literal, and TimeType().toTypeInfo raise UNSUPPORTED_DATATYPE.
HiveUDFSuite: passing TIME'12:01:02' to a Hive GenericUDFHash fails with a message naming the unsupported TIME type.
InsertSuite: INSERT OVERWRITE LOCAL DIRECTORY ... STORED AS PARQUET SELECT TIME'...' (with spark.sql.hive.convertMetastoreInsertDir=false) raises UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.8)

MaxGekk · 2026-06-29T06:53:55Z

Could you review when you have a moment? cc @cloud-fan @dongjoon-hyun

MaxGekk · 2026-06-29T10:53:46Z

@uros-b Could you review this PR, please.

uros-b · 2026-06-29T14:44:11Z

+      assert(e.getMessage.contains("UNSUPPORTED_DATATYPE"))
+      assert(e.getMessage.contains(TimeType().sql))


Nit: please use checkError instead of assert/contains.

I looked into this but checkError does not cleanly apply here. The Hive UDF resolver in HiveSessionStateBuilder.makeFunctionExpression catches the inner UNSUPPORTED_DATATYPE and re-wraps it into a new _LEGACY_ERROR_TEMP_3084 AnalysisException, capturing the original only as the e string parameter (e.toString) and copying its stack trace -- it does not attach it as a cause (the 2-arg AnalysisException(errorClass, messageParameters) constructor sets cause = None). So getCause is null and there is no inner condition to assert against; the only structured option would be checkError on _LEGACY_ERROR_TEMP_3084 with a brittle full-message e param.

I kept assert(...contains...) and filed SPARK-57750 (sub-task of SPARK-37935) to give that legacy error a proper name and attach the cause, after which this can become a clean nested checkError.

It would be better to merge this #56867 first of all.

uros-b · 2026-06-29T14:44:39Z

+          condition = "UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE",
+          parameters = Map(
+            "columnName" -> "`c`",
+            "columnType" -> s"\"${ArrayType(TimeType()).sql}\"",


Nit: how about map/struct? Shall we do those too?

uros-b · 2026-06-29T14:45:40Z

+      }
+    }
+  }
+


Nit: how about a more direct table-write case, e.g. INSERT INTO <metastore Hive serde table>?

Good idea, but it turns out a Hive serde table cannot hold a TIME column in the first place, so a managed-table INSERT INTO never reaches the new write-path check. CREATE TABLE t (c TIME) STORED AS PARQUET fails at metastore creation:

AnalysisException: HiveException: IllegalArgumentException: Error: type expected at the position 0 of 'time(6)' but 'time' is found. at HiveExternalCatalog.createTable ... at CreateTableCommand.run

HiveClientImpl.toHiveColumn emits the catalog string time(6) as the Hive type, which Hive cannot parse. So the INSERT OVERWRITE DIRECTORY ... STORED AS cases (scalar + nested) are the ones that exercise HiveFileFormat.supportDataType directly; I left this as-is.

uros-b · 2026-06-29T14:45:50Z

  time-zone.
  - `TimeType(precision)`: Represents values comprising values of fields hour, minute and second with the number of decimal digits `precision` following the decimal point in the seconds field, without a time-zone.
  The range of values is from `00:00:00` to `23:59:59` for min precision `0`, and to `23:59:59.999999999` for max precision `9`. The default precision is `6`.
+    - Note: Apache Hive has no TIME type, so `TimeType` is not supported in Hive SerDe interop. Storing it in a Hive SerDe table (including `INSERT OVERWRITE DIRECTORY ... STORED AS`) or passing it to a Hive UDF/UDAF/UDTF raises an error rather than silently converting the value.


Suggested change

- Note: Apache Hive has no TIME type, so `TimeType` is not supported in Hive SerDe interop. Storing it in a Hive SerDe table (including `INSERT OVERWRITE DIRECTORY ... STORED AS`) or passing it to a Hive UDF/UDAF/UDTF raises an error rather than silently converting the value.

- Note: Apache Hive has no TIME type, so `TimeType` is not supported in Hive SerDe interop. Storing it in a Hive SerDe table (including `INSERT OVERWRITE DIRECTORY ... STORED AS`) or passing it to a Hive UDF/UDAF/UDTF raises an error rather than silently converting the value.

uros-b

Left a few comments, but otherwise LGTM - thank you @MaxGekk!

MaxGekk · 2026-06-29T15:57:55Z

Thank you for the reviews, @uros-b! Addressed your comments: applied the doc note suggestion, extended the nested-type coverage to map and struct, and replied inline on the direct table-write case and the checkError suggestion (the latter tracked by SPARK-57750).

dongjoon-hyun

+1, LGTM. Thank you, @MaxGekk .

MaxGekk · 2026-06-29T19:28:01Z

Follow-up for the checkError discussion above: I opened #56867 to give _LEGACY_ERROR_TEMP_3084 a proper name and attach the original exception as the cause (SPARK-57750). Once that lands, the Hive UDF test here can assert the inner error directly via checkError. @uros-b would you mind taking a look at #56867?

cloud-fan

0 blocking, 0 non-blocking, 0 nits.
Clean, well-scoped change: replaces a cryptic MatchError with a clear, documented UNSUPPORTED_DATATYPE error, faithfully follows the SPARK-51590 / OrcFileFormat.supportDataType analogue (the case _ => true default is the correct choice for Hive SerDe's broader supported-type set), and is well tested across all three affected surfaces.

One optional, non-blocking observation: toInspector(DataType) still has no default branch, so other unmapped types (e.g. VariantType reached as a Hive UDF argument) would still surface a raw MatchError. Pre-existing and outside this PR's TIME scope -- noting only for awareness, not as a request.

… SerDe interop ### What changes were proposed in this pull request? Apache Hive has no TIME type, so `TimeType` has no faithful representation in Hive SerDe interop. This PR makes `TimeType` produce a clear `AnalysisException` instead of a `scala.MatchError` or an internal error when it reaches the `HiveInspectors` mapping functions, and rejects it in the Hive SerDe write path: - `HiveInspectors.toInspector(dataType)`, `toInspector(expr)` (TIME literal) and `toTypeInfo` now throw `UNSUPPORTED_DATATYPE` via a shared helper. - `HiveFileFormat.supportDataType` rejects `TimeType` (recursing into nested types) so Hive SerDe writes raise `UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE`. ### Why are the changes needed? `HiveInspectors` had no `TimeType` case, so object-inspector creation and TypeInfo mapping fell through to a `MatchError`/internal error when a TIME column or literal reached Hive SerDe paths (e.g. a Hive UDF argument). This makes the behavior explicit and documented, consistent with the existing TIME rejection for Hive ORC (SPARK-51590). ### Does this PR introduce any user-facing change? Yes. Using TIME with Hive UDFs or Hive SerDe writes now fails with a clear error message naming the unsupported TIME type rather than a MatchError/internal error. ### How was this patch tested? Added tests in `HiveInspectorSuite`, `HiveUDFSuite` and `InsertSuite`, and documented the limitation in `docs/sql-ref-datatypes.md`.

MaxGekk · 2026-06-30T16:44:31Z

Merging to master/4.x. Thank you, @dongjoon-hyun @uros-b @cloud-fan for review.

… SerDe interop ### What changes were proposed in this pull request? Apache Hive has no TIME type, so `TimeType` has no faithful representation in Hive SerDe interop. This PR (the Option B / "clear, documented error" path from [SPARK-57556](https://issues.apache.org/jira/browse/SPARK-57556)) makes `TimeType` produce a clear `AnalysisException` instead of a `scala.MatchError`/internal error when it reaches the `HiveInspectors` mapping functions, and rejects it in the Hive SerDe write path: - `HiveInspectors.toInspector(dataType)`, `toInspector(expr)` (TIME literal) and `toTypeInfo` now throw `UNSUPPORTED_DATATYPE` via a shared `unsupportedHiveType` helper. Previously `toInspector(dataType)` had no `TimeType` case and no default branch, so a TIME column hit a raw `scala.MatchError`. - `HiveFileFormat.supportDataType` rejects `TimeType` (recursing into nested struct/array/map/UDT types, preserving the prior default for all other types) so Hive SerDe writes raise `UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE` (format `Hive`) via `FileFormatWriter.verifySchema`. - Documented the limitation on the TIME entry in `docs/sql-ref-datatypes.md`. ### Why are the changes needed? `HiveInspectors` had no `TimeType` case, so object-inspector creation and TypeInfo mapping fell through to a `MatchError`/internal error when a TIME column or literal reached Hive SerDe paths (for example, a TIME argument to a Hive UDF/UDAF/UDTF). This makes the behavior explicit and documented, consistent with the existing TIME rejection for Hive ORC (SPARK-51590). ### Does this PR introduce _any_ user-facing change? Yes. Using TIME with Hive UDFs or in a Hive SerDe write now fails with a clear error that names the unsupported TIME type, instead of a `MatchError`/internal error. For example, `SELECT myHiveUDF(TIME'12:01:02')` now reports `[UNSUPPORTED_DATATYPE] Unsupported data type "TIME(6)"` (wrapped by the Hive UDF resolver), and writing a TIME column through the Hive SerDe write path reports `[UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE] The Hive datasource doesn't support the column ... of the type "TIME(6)"`. ### How was this patch tested? Added tests and ran them locally (`build/sbt 'hive/testOnly *HiveInspectorSuite *HiveUDFSuite *InsertSuite'`): - `HiveInspectorSuite`: `toInspector(TimeType())`, a TIME literal, and `TimeType().toTypeInfo` raise `UNSUPPORTED_DATATYPE`. - `HiveUDFSuite`: passing `TIME'12:01:02'` to a Hive `GenericUDFHash` fails with a message naming the unsupported TIME type. - `InsertSuite`: `INSERT OVERWRITE LOCAL DIRECTORY ... STORED AS PARQUET SELECT TIME'...'` (with `spark.sql.hive.convertMetastoreInsertDir=false`) raises `UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE`. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor (Claude Opus 4.8) Closes #56850 from MaxGekk/time-hive-serde. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit e80f420) Signed-off-by: Max Gekk <max.gekk@gmail.com>

uros-b reviewed Jun 29, 2026

View reviewed changes

uros-b approved these changes Jun 29, 2026

View reviewed changes

dongjoon-hyun approved these changes Jun 29, 2026

View reviewed changes

cloud-fan approved these changes Jun 30, 2026

View reviewed changes

MaxGekk force-pushed the time-hive-serde branch from ca1d470 to 788f731 Compare June 30, 2026 15:08

MaxGekk closed this in e80f420 Jun 30, 2026

		assert(e.getMessage.contains("UNSUPPORTED_DATATYPE"))
		assert(e.getMessage.contains(TimeType().sql))

	- Note: Apache Hive has no TIME type, so `TimeType` is not supported in Hive SerDe interop. Storing it in a Hive SerDe table (including `INSERT OVERWRITE DIRECTORY ... STORED AS`) or passing it to a Hive UDF/UDAF/UDTF raises an error rather than silently converting the value.
	- Note: Apache Hive has no TIME type, so `TimeType` is not supported in Hive SerDe interop. Storing it in a Hive SerDe table (including `INSERT OVERWRITE DIRECTORY ... STORED AS`) or passing it to a Hive UDF/UDAF/UDTF raises an error rather than silently converting the value.

Uh oh!

Conversation

MaxGekk commented Jun 28, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

MaxGekk commented Jun 29, 2026

Uh oh!

MaxGekk commented Jun 29, 2026

Uh oh!

uros-b Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

MaxGekk Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

MaxGekk Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

uros-b Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

uros-b Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

MaxGekk Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

uros-b Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

uros-b left a comment

Choose a reason for hiding this comment

Uh oh!

MaxGekk commented Jun 29, 2026

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

MaxGekk commented Jun 29, 2026

Uh oh!

cloud-fan left a comment

Choose a reason for hiding this comment

Uh oh!

MaxGekk commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants